Annealed Sequential Monte Carlo for Bayesian Logistic Regression
This analysis applies Annealed Sequential Monte Carlo (SMC) to perform Bayesian inference on the Wisconsin Breast Cancer dataset. We estimate posterior distributions for logistic regression coefficients that predict malignant vs benign tumors.
Resolving
package versions...
No Changes to `~/projects/smc-breastcancer/Project.toml`
No Changes to `~/projects/smc-breastcancer/Manifest.toml`
TaskLocalRNG()
Load and Explore Data
# Load the breast cancer datasetX, y =load_breast_cancer()println("Dataset dimensions: $(size(X))")println("Number of features: $(size(X, 2))")println("Number of samples: $(size(X, 1))")println("\nClass distribution:")println(" Benign (0): $(sum(y .==0))")println(" Malignant (1): $(sum(y .==1))")
Downloading dataset from URL...
Dataset shape: (699, 11)
✓ Dataset loaded successfully!
Samples: 699
Features: 9
Benign (0): 458
Malignant (1): 241
Dataset dimensions: (699, 9)
Number of features: 9
Number of samples: 699
Class distribution:
Benign (0): 458
Malignant (1): 241
We use Sequential Monte Carlo with annealing to sample from the posterior distribution of logistic regression coefficients. The algorithm gradually transitions from the prior (β=0) to the posterior (β=1).
# SMC hyperparametersN_particles =500mcmc_steps =5step_scale =0.2ess_threshold =0.5println("\nRunning Annealed SMC...")println(" Particles: $N_particles")println(" MCMC steps per iteration: $mcmc_steps")println(" Step scale: $step_scale")println(" ESS threshold: $ess_threshold")# Run the sampler@time particles, particle_weights, betas, acc_hist = SMC.annealed_smc( X, y; N=N_particles, mcmc_steps=mcmc_steps, step_scale=step_scale, ess_frac=ess_threshold)println("\nSMC completed!")println(" Total annealing steps: $(length(betas))")println(" Final β: $(betas[end])")println(" Mean acceptance rate: $(round(mean(acc_hist), digits=3))")
Running Annealed SMC...
Particles: 500
MCMC steps per iteration: 5
Step scale: 0.2
ESS threshold: 0.5
1297.647315 seconds (1.53 G allocations: 1.103 TiB, 5.02% gc time, 0.05% compilation time)
SMC completed!
Total annealing steps: 29387
Final β: 1.0
Mean acceptance rate: 0.854
This analysis demonstrated Bayesian logistic regression using Annealed Sequential Monte Carlo on the Wisconsin Breast Cancer dataset. Key findings:
Annealing Schedule: The adaptive temperature schedule required $(length(betas)) iterations to transition from prior to posterior.
Key Predictive Features: The features with strongest effects are shown above, with credible intervals excluding zero indicating statistical significance in the Bayesian sense.
Model Performance: The posterior mean classifier achieves $(round(accuracy * 100, digits=1))% accuracy on the training data.
Posterior Uncertainty: The posterior distributions show varying degrees of uncertainty across features, with some coefficients more precisely estimated than others.
Next Steps
Cross-validation: Implement k-fold CV to assess out-of-sample performance
Model comparison: Compare with simpler models using marginal likelihood
Feature selection: Use posterior distributions to identify sparse models
Sensitivity analysis: Test robustness to prior specifications